Two Sample T test

Published

December 18, 2023

Introduction

The two-sample t-test is a fundamental tool used to determine if there is a significant difference between the means of two independent groups. This test is widely used in various fields, from medical research to business analytics, to compare different groups and draw meaningful conclusions.

What is a Two-Sample t-Test?

A two-sample t-test, also known as an independent t-test, compares the means of two independent groups to determine if they are statistically different from each other. This test is particularly useful when comparing the effects of different treatments, analyzing differences between two populations, or evaluating changes between two distinct groups.

When to Use a Two-Sample t-Test?

A two-sample t-test is appropriate when:

You have two independent samples of continuous data.

You want to compare the means of these two samples.

The data in each group is approximately normally distributed.

Hypotheses in Two-Sample t-Test

Null Hypothesis (H0): The means of the two groups are equal.

Alternative Hypothesis (H1): The means of the two groups are not equal

Types of Two-Sample t-Tests

Independent Two-Sample t-Test:

Assumes that the two samples are independent of each other.

Paired Two-Sample t-Test

Assumes that the two samples are related or paired in some way, such as before-and-after measurements on the same subjects.

Independent Two-Sample t-Test

Assumptions of Two-Sample t-Test

Normality: The data in each group should be approximately normally distributed.

Independence: The observations in each group should be independent of each other.

Homogeneity of Variances: The variances of the two groups should be equal (this can be checked using Levene’s test).

Example in R

Let’s go through a practical example to understand how to perform an independent two-sample t-test in R. Suppose we have two groups of students, one that received traditional classroom instruction and another that received online instruction. We want to test if there is a significant difference in their test scores.

Code

#packages
library(tidyverse)
library(car)
library(ggstatsplot)
# Sample data: students' scores
classroom_scores <- c(78, 82, 69, 71, 85, 79, 77, 68, 74, 81)
online_scores <- c(85, 88, 76, 80, 89, 84, 82, 78, 86, 83)
data <- data.frame(classroom_scores,online_scores)
# data preprocessing
data<- data %>% pivot_longer(cols = c(1,2),
                             names_to = "Method",
                             values_to = "Scores")
#checking assumptions
#normality
data %>% 
  group_by(Method) %>% 
  dlookr::normality()

# A tibble: 2 × 5
  variable Method           statistic p_value sample
  <chr>    <chr>                <dbl>   <dbl>  <dbl>
1 Scores   classroom_scores     0.955   0.732     10
2 Scores   online_scores        0.974   0.928     10

p value > 0.05 means that the data is normally distributed

Code

# homogenity of variance
leveneTest(Scores~ Method, data = data)

Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  1.0242 0.3249
      18

p value > 0.05 means that the variances of the two groups are equal

Code

# Perform two-sample t-test

t_test_result <- t.test(Scores~ Method,var.equal = T ,data = data)

# Print the result
print(t_test_result)


    Two Sample t-test

data:  Scores by Method
t = -2.9788, df = 18, p-value = 0.008047
alternative hypothesis: true difference in means between group classroom_scores and group online_scores is not equal to 0
95 percent confidence interval:
 -11.425388  -1.974612
sample estimates:
mean in group classroom_scores    mean in group online_scores 
                          76.4                           83.1

Since the p-value 0.008047 is less than the significance level (typically 0.05), we reject the null hypothesis. This means there is enough evidence to conclude that there is a significant difference between the mean test scores of the classroom and online groups.

Visualizing the Test

Code

ggbetweenstats(
  data = data,
  x = Method,
  y = Scores,
  type = "p",
  var.equal = T
)+labs(title = "Two Sample T Test")

Effectsize

Code

library(effectsize)
interpret_hedges_g(-1.27)

[1] "large"
(Rules: cohen1988)

The size of the difference is in the mean of the two groups is large.

What if the data is not normally distributed

Code

# Sample data
group1 <- c(2.1, 2.3, 1.8, 2.5, 2.0)
group2 <- c(3.1, 3.3, 2.8, 3.5, 3.0)
data2 <- data.frame(group1,group2)
data2 <- data2 %>% 
  pivot_longer(cols = c(1,2), names_to = "group",
               values_to = "score")

# Perform the Mann-Whitney U Test
wilcox.test(score ~ group, data = data2)


    Wilcoxon rank sum exact test

data:  score by group
W = 0, p-value = 0.007937
alternative hypothesis: true location shift is not equal to 0

Visualing the test

Code

ggbetweenstats(
  data = data2,
  x = group,
  y = score,
  type = "nonparametric"
)+labs(title = "mann-whitney t test or\nWilcoxon rank sum exact test")

pvalue <.05 indicates that we reject the null hypothesis and concludes that there is significance difference in score between the two groups.

Code

interpret_rank_biserial(-1.00)

[1] "very large"
(Rules: funder2019)

The size of the difference is very large.

Paired T Test

Explanation of paired t test

The paired t test is designed to compare the means of two related groups, often before and after an intervention. In our case lets compare the mean of the number of sit-ups before and after the physical fitness course and the comparison of life expectancy in Africa between 1952 and 2007

Set up Hypothesis

Ho: mean(Before)=is equal to mean(After) H1: mean(Before) is not equal to mean(After)

Required packages

Code

library(BSDA)# for fitness dataset
library(gapminder)# For gapminder dataset

Assumptions

The difference between paired observations are normally distributed lets check that in our case

Code

df <- Fitness %>% pivot_wider(names_from = test,
                             values_from = number) %>% 
  mutate(Difference = After - Before)
shapiro.test(df$Difference)


    Shapiro-Wilk normality test

data:  df$Difference
W = 0.95124, p-value = 0.7037

p value >.05 indicates that the difference is normally distributed. this means that we are going to conduct a parametric paired sample t test.

Conducting the Tests

Code

Fitness %>% 
  t.test(number ~ test, data = ., paired = T)


    Paired t-test

data:  number by test
t = 2.753, df = 8, p-value = 0.02494
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 0.3247268 3.6752732
sample estimates:
mean difference 
              2

Conducting and visualizing the test

Code

ggwithinstats(data = Fitness,
              x = test,
              y = number,
              type = "parametric"
              )+
  labs(title = "Paired Sample T Tests")

Interpretation

p value <.05 indicates that there is an improvement in the number of sit ups after physical fitness course.

Effect size

Code

interpret_hedges_g(0.83)

[1] "large"
(Rules: cohen1988)

The difference in means(Before and After) is large.

Example 2

we wish to find whether the life expectancy in Africa countries in 1952 and 2007 was the same

Set up hypothesis

Ho: the life expectancy was the same in 1952 and 2007 H1: the life expectancy differs in the two years.

Preparing the data

Code

a <- gapminder %>% 
  filter(year %in% c(2007,1952) & continent == "Africa")

Checking for the Assumption

Code

a %>% group_by(year) %>% 
  dlookr::normality(lifeExp) %>% flextable::flextable()

variable	year	statistic	p_value	sample
lifeExp	1,952	0.9793928	0.50034098	52
lifeExp	2,007	0.9398765	0.01107002	52

one group p value is <.05 therefore we will use a non parametric test

Conducting and visualizing the test

Code

ggwithinstats(
  data = a,
  x = year,
  y = lifeExp,
  type = "nonparametric"
)+ labs(title = "Non parametric paired t test(wilcoxon rank test)",
        y = "Life expectancy",
        caption = "Data source: https://www.gapminder.org/data/")

Interpreting p value

p value <.05 indicates that life expectancy in both years differs with the life expectancy in 2007 being higher as compared to that of 1952

Effectsize

Code

interpret_rank_biserial(0.98)

[1] "very large"
(Rules: funder2019)